Implicit readability ranking using the latent variable of a Bayesian Probit model

نویسندگان

  • Johan Falkenjack
  • Arne Jönsson
چکیده

Data driven approaches to readability analysis for languages other than English has been plagued by a scarcity of suitable corpora. Often, relevant corpora consist only of easy-to-read texts with no rank information or empirical readability scores, making only binary approaches, such as classification, applicable. We propose a Bayesian, latent variable, approach to get the most out of these kinds of corpora. In this paper we present results on using such a model for readability ranking. The model is evaluated on a preliminary corpus of ranked student texts with encouraging results. We also assess the model by showing that it performs readability classification on par with a state of the art classifier while at the same being transparent enough to allow more sophisticated interpretations.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Analysis of Bayesian Probit Regression of Binary and Polychotomous Response Data

The goal of this study is to introduce a statistical method regarding the analysis of specific latent data for regression analysis of the discrete data and to build a relation between a probit regression model (related to the discrete response) and normal linear regression model (related to the latent data of continuous response). This method provides precise inferences on binary and multinomia...

متن کامل

Application of Bayesian Latent Variable Model for Early Detection of Gestational Diabetes Mellitus Without A Perfect Reference Standard Test by β‐human Chorionic Gonadotropin

Background and Objectives: Gestational diabetes mellitus (GDM) is a medical problem in pregnancy, and its late diagnosis can cause adverse effects in the mother and fetus. The purpose of this research was to estimate the accuracy parameters of a biomarker for early prediction of gestational diabetes in the absence of a perfect reference standard test.   Methods: This study was conducted in 52...

متن کامل

A Multivariate Probit Latent Variable Model for Analyzing Dichotomous Responses

We propose a multivariate probit model that is defined by a confirmatory factor analysis model with covariates for analyzing dichotomous data in medical research. Our proposal is a generalization of several useful multivariate probit models, and provides a flexible framework for practical applications. We implement a Monte Carlo EM algorithm for maximum likelihood estimation of the model, and d...

متن کامل

Gender-based Differences in Associations between Attitude and Self-esteem with Smoking Behavior among Adolescents: A Secondary Analysis Applying Bayesian Nonparametric Functional Latent Variable Model

Background: Different patterns of gender-based relationships between attitude toward smoking and self-esteem with smoking behavior have reported. However, such associations may be much more complex than a simply supposed linear relationship. We aimed to propose a method of providing hand details on the total and gender-based scenarios of the relationships between attitude toward smoking and sel...

متن کامل

Mean-field variational approximate Bayesian inference for latent variable models

The ill-posed nature of missing variable models offers a challenging testing ground for new computational techniques. This is the case for the mean-field variational Bayesian inference. The behavior of this approach in the setting of the Bayesian probit model is illustrated. It is shown that the mean-field variational method always underestimates the posterior variance and, that, for small samp...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016